On the Limits of Sentence Compression by Deletion
نویسندگان
چکیده
Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that this is partly due to the lack of appropriate evaluation material and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining cases in which deletion only failed to provide the required level of compression. We conclude that in those cases word order changes and paraphrasing are crucial. We therefore argue for more elaborate sentence compression models which include paraphrasing and word reordering. We report preliminary results of applying a recently proposed more powerful compression model in the context of subtitling
منابع مشابه
Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion
We present a substitution-only approach to sentence compression which “tightens” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual corpora. At high compression rates1 paraphrasti...
متن کاملLearning to Simplify Sentences Using Wikipedia
In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text ...
متن کاملSentence Compression by Deletion with LSTMs
We present an LSTM approach to deletion-based sentence compression where the task is to translate a sentence into a sequence of zeros and ones, corresponding to token deletion decisions. We demonstrate that even the most basic version of the system, which is given no syntactic information (no PoS or NE tags, or dependencies) or desired compression length, performs surprisingly well: around 30% ...
متن کاملElimination of the Elements of the Sentense in Sahife-ye-Shahi Book
Language always goes forward the brevity way, which means trying to convey its intentions by using the least number of words.The consequence of this process is contingencies such as deletion of sentence components. Poets and writers sometimes omitted some of the components of the word in order to summarize the word and, of course, to observe the principles of rhetoric, punctilios and syntactic ...
متن کاملUsing First-Order Logic to Compress Sentences
Sentence compression is one of the most challenging tasks in natural language processing, which may be of increasing interest to many applications such as abstractive summarization and text simplification for mobile devices. In this paper, we present a novel sentence compression model based on first-order logic, using Markov Logic Network. Sentence compression is formulated as a word/phrase del...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010